Introduction

Purpose: Introduce participants to the unix command line to increase efficiency and reproducibility.

Background: Scientific research, and especially scientific synthesis, requires extensive computing, which typically involves creation of hudreds or thousands of data files, analytical processes, and many products such as graphs, model outputs, maps, images, and more. Managing this complexity is often really hard, and researchers can easily lose track of just what they did, when, and why. Thus, traditional approaches to managing the research process often fail the reproducibility test, as even the original investigators can’t repeat their initial process.

versions - From PhDComics

Thus, the command line. The command-line provides two major advantages to the researcher:

These advantages come at the cost of:

Learning outcomes

  1. To understand the advantages of the command line over graphical interfaces
  2. To understand the basic philosophy of Unix
  3. To learn core Unix commands for navigation and file management

The Command Line

Command Line Command line interpreters come in many flavors. We will focus on Unix shell syntax, but even with Unix there is tremendous diversity. Learning all of the nuances of one shell can take years, but you will also find many variants on the servers you encounter.

Draw Bridge

In this lesson, you will learn the power of the command line. But, you will undoubtedly also internalize that computers are exceedingly literal. You type, they do. Or, more commonly, you type, they give an error. My experience has been that I am far more productive when my attitude towards the computer is:

  • Patient
  • Persistent
  • Systematic
  • Incremental

The operating system

OS Diagram

OS Diagram

OSes

OSes

All about the X

UNIX logo

UNIX logo

Some Unix hallmarks

Dilbert cartoon

Dilbert cartoon

  • Multi-user, multi-process
  • Separation of engine from interface
  • Small tools for specific jobs
  • Culture of text files and streams

The Command Line

Screenshot of SQL DB Browser GUI

Screenshot of SQL DB Browser GUI

  • GUI: Graphical user interface
  • CLI: Command-line interface

CLIs let you get things done

Why the CLI is worth learning

  • Typically much more extensive access to features, commands, options
  • Command statements can be written down, saved (scripts!)
  • Easier automation
  • Much “cheaper” to do work on a remote system (no need to transmit all the graphical stuff over the network)

Let’s connect

jones@powder:~$ ssh jones@aurora.nceas.ucsb.edu

which should give you a terminal window showing aurora as the host name:

Files and directories

In the following code examples, you need to type the command, but not include the command prompt (e.g., jones@aurora:~$) which just shows that the computer is ready to accept a command.

We’ll start by:

  • creating two directories with mkdir (make directory)
  • create a simple text file using echo
  • Show the contents of that file using cat (concatenate)
jones@aurora:~$ mkdir oss
jones@aurora:~$ mkdir oss/data
jones@aurora:~$ echo "# Tutorial files related to OSS" > oss/README.md
jones@aurora:~$ cat oss/README.md
# Tutorial files related to OSS
jones@aurora:~$ 

Next, let’s copy another file and look around in the directories:

  • copy a file into your directory with cp (copy)
  • change our working directory to that newly created directory using cd (change directory)
  • list the files in the directory with ls (list)
  • look where we are in the filesystem using pwd (print working directory)
  • get an overview of the directory contents using tree
jones@aurora:~$ cp /tmp/plotobs.csv oss/data
jones@aurora:~$ cd oss
jones@aurora:~/oss$ ls
data  README.md
jones@aurora:~/oss$ pwd
/home/jones/oss
.
├── data
│   └── plotobs.csv
└── README.md

1 directory, 2 files
jones@aurora:~/oss$

Now, let’s create another directory with two files so that we can demonstrate removing files and directories.

  • create another two directories named sites and lakes using mkdir
  • create 3 files in each directory using echo
  • examine our file structure using tree
  • move or rename a file using mv (move)
  • change our working directory to the sites directory using cd and check it with pwd
  • remove two of the files using rm (remove) and list files with ls
  • change directory back to the parent and check it with pwd
  • remove the sites directory using rmdir, which will produce and error because the directory contains files
    • rmdir can only remove empty directories
  • try again, using a command-line switch rm -r sites to recursively remove the directory and its files
jones@aurora:~/oss$ mkdir sites lakes
jones@aurora:~/oss$ echo "Site 1 Info" > sites/site1.txt
jones@aurora:~/oss$ echo "Site 2 Info" > sites/site2.txt
jones@aurora:~/oss$ echo "Site 3 Info" > sites/site3.txt
jones@aurora:~/oss$ echo "Lake Mary" > lakes/LakeMary.md
jones@aurora:~/oss$ echo "Lake Hunter" > lakes/HunterLake.md
jones@aurora:~/oss$ tree .
.
├── data
│   └── plotobs.csv
├── lakes
│   ├── HunterLake.md
│   └── LakeMary.md
├── README.md
└── sites
    ├── site1.txt
    ├── site2.txt
    └── site3.txt

3 directories, 7 files
jones@aurora:~/oss$ mv lakes/HunterLake.md lakes/LakeHunter.md
jones@aurora:~/oss$ ls lakes
LakeHunter.md  LakeMary.md
jones@aurora:~/oss$ cd sites
jones@aurora:~/oss/sites$ pwd
/home/jones/oss/sites
jones@aurora:~/oss/sites$ rm site2.txt site3.txt
jones@aurora:~/oss/sites$ ls
site1.txt
jones@aurora:~/oss/sites$ cd ..
jones@aurora:~/oss$ pwd
/home/jones/oss
jones@aurora:~/oss$ rmdir sites
rmdir: failed to remove 'sites': Directory not empty
jones@aurora:~/oss$ rm -r sites
jones@aurora:~/oss$ ls
data  lakes  README.md
jones@aurora:~/oss$

Note the use of the single dot . and the double dot .. symbols in these commands. A single dot . represents the current directory, and a double dot .. represents the parent directory. One can also use the tilde symbol ~ to represent the your home directory, which is main directory where your files will reside.

Let’s look at our text file

  • cat print file(s)
  • head print first few lines of file(s)
  • tail print last few lines of file(s)
  • grep search for matching lines of file(s)
  • less “pager” – view file interactively
jones@aurora:~/oss$ head data/plotobs.csv 
obsid,siteid,plot,date_sampled,sciname ,diameter,condition
1,1,A,6/13/11,Abies lasiocarpa,31.84,normal
2,1,A,6/13/11,Picea engelmannii,3.21,dry
3,1,A,6/13/11,Picea engelmannii,7.2,dry
4,1,A,6/13/11,Picea engelmannii,11.62,dry
5,1,A,6/13/11,Picea engelmannii,11.25,dry
6,1,A,6/13/11,Picea engelmannii,13.16,normal
7,1,A,6/13/11,Picea engelmannii,18.6,normal
8,1,A,6/13/11,Picea engelmannii,23.62,dry
9,1,A,6/13/11,Picea engelmannii,31.75,normal
jones@aurora:~/oss$ tail data/plotobs.csv 
3287,32,B,6/10/12,Pseudotsuga menziesii,4.38,normal
3288,32,B,6/10/12,Pseudotsuga menziesii,3.09,dry
3289,32,B,6/10/12,Jamesia americana,7.98,dry
3290,32,B,6/10/12,Abies lasiocarpa,10.85,normal
3291,32,B,6/10/12,Abies lasiocarpa,13.55,dry
3292,32,B,6/10/12,Abies lasiocarpa,17.26,normal
3293,32,B,6/10/12,Abies lasiocarpa,21.65,dry
3294,32,B,6/10/12,Abies lasiocarpa,17.8,dry
3295,32,B,6/10/12,Abies lasiocarpa,23.4,normal
3296,32,B,6/10/12,Abies lasiocarpa,25.79,normal
jones@aurora:~/oss$ grep Sambucus data/plotobs.csv
13,1,A,6/13/11,Sambucus racemosa,3.83,dry
44,1,A,6/10/12,Sambucus racemosa,17.04,dry
75,1,B,6/13/11,Sambucus racemosa,3.39,dry
91,1,B,6/10/12,Sambucus racemosa,19.53,dry
116,2,A,6/13/11,Sambucus racemosa,1.4,dry
147,2,A,6/10/12,Sambucus racemosa,20.19,dry
...
jones@aurora:~/oss$

Permissions

Working on syntheis projects means collaborating, which is really easy on unix because multiple people can use the same computer at the same time. Right now, we are all using aurora at the same time, running commands in our own part of the file storage system. But what if we want to share files? Unix lets each person control who can access their files through a set of permissions which can be seen by doing a long listing with ls -l:

jones@aurora:~/oss$ ls -l README.md 
-rw-rw-r-- 1 jones staff 32 Jul 10 01:10 README.md
jones@aurora:~/oss$ 

You can interpret the permissions as follows:

File permissions

File permissions

All files have permissions and ownership

  • Change permissions: chmod, using for example chmod o+r README.md
  • Change ownership: chown

To see someone else’s files, they have to permit you to have the proper permissions. For example, you can examine the contents of one of my data files using the full path to the file, as long as you have the needed file permissions:

jones@aurora:~/oss$ cat ~jones/oss/lakes/LakeMary.md 
Lake Mary
jones@aurora:~/oss$

Getting help

I will ask for help before my problems get out of control

I will ask for help before my problems get out of control

  • <command> -h, <command> --help
  • man, info, apropos, whereis
  • Search the web!

General command syntax

  • $ command [options] [arguments]

where command must be an executable file on your PATH

  • echo $PATH

and options can usually take two forms

  • short form: -h
  • long form: --help

Cheatsheets for remembering it all

Linux/Unix Cheetsheet: http://cheatsheetworld.com/programming/unix-linux-cheat-sheet/

Text editing

Some editors

  • vim
  • emacs
  • nano

$ nano lakes/LakeMary.md

A sampling of useful commands

  • wc count lines, words, and/or characters
  • diff compare two files for differences
  • sort sort lines in a file
  • uniq report or filter out repeated lines in a file

Get into the flow, with pipes

stdin, stdout, stderr

stdin, stdout, stderr

$ ls -1 lakes | wc -l
$ ls -1 lakes | wc -l > lakecount.txt
$ diff <(sort file1.txt) <(sort file2.txt)
$ ls foo 2>/dev/null
$ tee

A sampling of more advanced utilities

  • grep search files for text
  • sed filter and transform text
  • find advanced search for files/directories
  • cut extract parts of files like columns
  • join merge files using a common shared column

grep

The file pattern searcher

The file pattern searcher

grep

Show all lines containing “bug” in my R scripts

$ grep bug *.R

Now count the number of occurrences per file

$ grep -c bug *.R

sed

The high tech filter

The high tech filter

sed

Remove all lines containing “bug”!

$ sed '/bug/d' myscript.R

Call them buglets, not bugs!

$ sed 's/bug/buglet/g' myscript.R

Actually, only do this on lines starting with

$ sed '/#/ s/bug/buglet/g' myscript.R

find

Search for stuff

Search for stuff

Like Rover, the Windows search dog. But more useful.

find

Show me my pdfs!

$ find . -iname '*.pdf'

Which files are larger than 10GB?

$ find . -size +10G -ls